Local Alignment of RNA Sequences with Arbitrary Scoring Schemes
نویسندگان
چکیده
Local similarity is an important tool in comparative analysis of biological sequences, and is therefore well studied. In particular, the Smith-Waterman technique and its normalized version are two established metrics for measuring local similarity in strings. In RNA sequences however, where one must consider not only sequential but also structural features of the inspected molecules, the concept of local similarity becomes more complicated. First, even in global similarity, computing global sequence-structure alignments is more difficult than computing standard sequence alignments due to the bi-dimensionality of information. Second, one can view locality in two different ways, in the sequential or structural sense, leading to different problem formulations. In this paper we introduce two sequentially-local similarity metrics for comparing RNA sequences. These metrics combine the global RNA alignment metric of Shasha and Zhang [16] with the Smith-Waterman metric [17] and its normalized version [2] used in strings. We generalize the familiar alignment graph used in string comparison to apply also for RNA sequences, and then utilize this generalization to devise two algorithms for computing local similarity according to our two suggested metrics. Our algorithms run in O(mn lg n) and O(mn lgn+nm) time respectively, where m ≤ n are the lengths of the two given RNAs. Both algorithms can work with any arbitrary scoring scheme.
منابع مشابه
Estimating the Gumbel Scale Parameter for Local Alignment of Random Sequences by Importance Sampling with Stopping Times.
The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relev...
متن کاملFRESCO: Flexible Alignment with Rectangle Scoring Schemes
While the popular DNA sequence alignment tools incorporate powerful heuristics to allow for fast and accurate alignment of DNA, most of them still optimize the classical Needleman Wunsch scoring scheme. The development of novel scoring schemes is often hampered by the difficulty of finding an optimizing algorithm for each non-trivial scheme. In this paper we define the broad class of rectangle ...
متن کاملConvergent Island Statistics: a fast method for determining local alignment score significance
MOTIVATION Background distribution statistics for profile-based sequence alignment algorithms cannot be calculated analytically, and hence such algorithms must resort to measuring the significance of an alignment score by assessing its location among a distribution of background alignment scores. The Gumbel parameters that describe this background distribution are usually pre-computed for a lim...
متن کاملLocality and Gaps in RNA Comparison
Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected mol...
متن کاملSequence Alignment Guided By Common Motifs Described By Context Free Grammars
We introduce a new problem, context-free grammars (CFG)-guided pairwise sequence alignment, whose most immediate application is the alignment of RNA sequences that share motifs described by context-free grammars. Such motifs include common RNA secondary (sub)structures (such as stem-loops) that are recognizable in sequences. The problem aims to align given sequences by including, from a given s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006